Statistical Approaches to Predictive Modeling
نویسندگان
چکیده
Prediction, i.e., predicting the potential values or value distributions of certain attributes for objects in a database or data warehouse, is an attractive goal in data mining. To predict future events not shown in databases with high quality can help users to make smart business decisions. With the concern of both scalability and high quality of prediction, we propose a predictive modeling algorithm for interactive prediction in large databases and data warehouses. The algorithm consists of three steps: (1) data generalization, which converts data in relational databases or data warehouses into a multi-dimensional databases to which e cient analysis techniques can be applied; (2) relevance analysis, which identi es the attributes that are highly relevant to the prediction, to reduce number of attributes in prediction with the bene ts in improving both e ciency and reliability of prediction; and (3) a statistical regression model, called generalized linear model, is constructed for high quality prediction. We explore two types of model featuring di erent problems. Moreover, with this method, a user can interact with a data mining system by presenting probes with constants at di erent levels of abstraction and attempt to predict values of a predicted attribute at di erent levels of abstraction. Also, a user may drill-down or roll-up along any attribute dimensions and then do prediction analysis. Our analysis and experimental results show that the method provides high prediction quality with modest or intermediate data generalization and it leads to e cient, interactive prediction in large databases. iii Acknowledgments I am deeply grateful to my senior supervisor, Dr. Jiawei Han, for all assistance given to me since I enrolled in this graduate program, especially for his invaluable advice, guidance and the time that he spent with me in the preparation of this thesis. I wish to express my gratitude to my committeemembers, Dr. Veronica Dahl and Dr. Qiang Yang for their valuable and insightful comments and suggestions. I would like to extend my thanks to all other faculty members of the School of Computing Science for all the courses I have taken from them. My thanks also go to fellow graduate students, especially those who have been working in the Intelligent Database System Lab, for the help and friendship they have given to me. I shall particularly mention Wan Gong, Jenny Chiang, Sonny Chee, and Shuhua Zhang for their great help through my research. I especially thank the School of Computing Science at Simon Fraser University for providing me the opportunity to study here. My sincere thanks are given to Mrs. E. Krbavac, Mrs. K. Jaager, Mrs. C. Edwards, and Mrs. W. Davis for their assistance. iv Dedication To my parents v
منابع مشابه
A Case on Predictive Modeling for Ayurveda Product Offerings
This paper deals with an overview of predictive modeling approaches adopted by marketers to identify segments in the target market which would have the largest response rates pertaining to a certain product or service being offered. The three approaches in the predictive modeling, which have been discussed are Heuristic approach, Statistical approach and Data Mining approach. However, with refe...
متن کاملPredictive modeling of biomass production by Chlorella vulgaris in a draft-tube airlift photobioreactor
The objective of this study was to investigate the growth rate of Chlorella vulgaris for CO2 biofixation and biomass production. Six mathematical growth models (Logistic, Gompertz, modified Gompertz, Baranyi, Morgan and Richards) were used to evaluate the biomass productivity in continuous processes and to predict the following parameters of cell growth: lag phase duration (λ), maximum specific...
متن کاملStatistical Modeling of Nuclear Systematics
Statistical modeling of data sets by neural-network techniques is offered as an alternative to traditional semiempirical approaches to global modeling of nuclear properties. New results are presented to support the position that such novel techniques can rival conventional theory in predictive power, if not in economy of description. Examples include the statistical inference of atomic masses a...
متن کاملPredictive models in ecology: Comparison of performances and assessment of applicability
Ecological systems are governed by complex interactions which are mainly nonlinear. In order to capture the inherent complexity and nonlinearity of ecological, and in general biological systems, statistical models recently gained popularity. However, although these models, particularly connectionist approaches such as multilayered backpropagation networks, are commonly applied as predictive mod...
متن کاملA survey on computational intelligence approaches for predictive modeling in prostate cancer
Predictive modeling in medicine involves the development of computational models which are capable of analysing large amounts of data in order to predict healthcare outcomes for individual patients. Computational intelligence approaches are suitable when the data to be modelled are too complex for conventional statistical techniques to process quickly and efficiently. These advanced approaches ...
متن کاملAn Infra-Structure for Performance Estimation and Experimental Comparison of Predictive Models in R
This document describes an infra-structure provided by the R package performanceEstimation that allows to estimate the predictive performance of different approaches (workflows) to predictive tasks. The infra-structure is generic in the sense that it can be used to estimate the values of any performance metrics, for any workflow on different predictive tasks, namely, classification, regression ...
متن کامل